Towards Seamless Tracking-Free Web: Improved Detection of Trackers via One-class Learning
نویسندگان
چکیده
Numerous tools have been developed to aggressively block the execution of popular JavaScript programs in Web browsers. Such blocking also affects functionality of webpages and impairs user experience. As a consequence, many privacy preserving tools that have been developed to limit online tracking, often executed via JavaScript programs, may suffer from poor performance and limited uptake. A mechanism that can isolate JavaScript programs necessary for proper functioning of the website from tracking JavaScript programs would thus be useful. Through the use of a manually labelled dataset composed of 2,612 JavaScript programs, we show how current privacy preserving tools are ineffective in finding the right balance between blocking tracking JavaScript programs and allowing functional JavaScript code. To the best of our knowledge, this is the first study to assess the performance of current web privacy preserving tools. To improve this balance, we examine the two classes of JavaScript programs and hypothesize that tracking JavaScript programs share structural similarities that can be used to differentiate them from functional JavaScript programs. The rationale of our approach is that web developers often “borrow” and customize existing pieces of code in order to embed tracking (resp. functional) JavaScript programs into their webpages. We then propose one-class machine learning classifiers using syntactic and semantic features extracted from JavaScript programs. When trained only on samples of tracking JavaScript programs, our classifiers achieve an accuracy of 99%, where the best of the privacy preserving tools achieved an accuracy of 78%. The performance of our classifiers is comparable to that of traditional two-class SVM. One-class classification, where a training set of only tracking JavaScript programs is used for learning, has the advantage that it requires fewer labelled examples that can be obtained via manual inspection of public lists of well-known trackers. We further test our classifiers and several popular privacy preserving tools on a larger corpus of 4,084 websites with 135,656 JavaScript programs. The output of our best classifier on this data is between 20 to 64% different from the tools under study. We manually analyse a sample of the JavaScript programs for which our classifier is in disagreement with all other privacy preserving tools, and show that our approach is not only able to enhance user web experience by correctly classifying more functional JavaScript programs, but also discovers previously unknown tracking services.
منابع مشابه
Online multiple people tracking-by-detection in crowded scenes
Multiple people detection and tracking is a challenging task in real-world crowded scenes. In this paper, we have presented an online multiple people tracking-by-detection approach with a single camera. We have detected objects with deformable part models and a visual background extractor. In the tracking phase we have used a combination of support vector machine (SVM) person-specific classifie...
متن کاملDetecting and Defending Against Third-Party Tracking on the Web
While third-party tracking on the web has garnered much attention, its workings remain poorly understood. Our goal is to dissect how mainstream web tracking occurs in the wild. We develop a client-side method for detecting and classifying five kinds of third-party trackers based on how they manipulate browser state. We run our detection system while browsing the web and observe a rich ecosystem...
متن کاملLike a Pack of Wolves: Community Structure of Web Trackers
Web trackers are services that monitor user behavior on the web. The information they collect is ostensibly used for customization and targeted advertising. Due to rising privacy concerns, users have started to install browser plugins that prevent tracking of their web usage. Such plugins tend to address tracking activity by means of crowdsourced filters. While these tools have been relatively ...
متن کاملMBest Struct: M-Best diverse sampling for structured tracker
We approach the problem of model-free visual tracking of objects in videos. Modelfree tracking has its state-of-the-art in a class of methods called tracking-by-detection, as shown in recent benchmarks. Some top-performing methods use deep neural networks (i.e., convnets) to solve the learning-based steps of the tracking algorithm (e.g., bounding-box prediction and evaluation). Despite improvin...
متن کاملMBestStruck: M-Best diverse sampling for structured tracker
We approach the problem of model-free visual tracking of objects in videos. Modelfree tracking has its state-of-the-art in a class of methods called tracking-by-detection, as shown in recent benchmarks. Some top-performing methods use deep neural networks (i.e., convnets) to solve the learning-based steps of the tracking algorithm (e.g., bounding-box prediction and evaluation). Despite improvin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PoPETs
دوره 2017 شماره
صفحات -
تاریخ انتشار 2017